In computer science, a programming language is said to have first-class functions if it treats functions as first-class objects. Specifically, this means that the language supports passing functions as arguments to other functions, returning them as the values from other functions, and assigning them to variables or storing them in data structures.[1][2] Some authors require support for anonymous functions as well.[3] The term was coined by Christopher Strachey in the context of “functions as first-class citizens” in the mid-1960s.[4]
These features are a necessity for the functional programming style, in which (for instance) the use of higher-order functions is a standard practice. A simple example of a higher-ordered function is the map function, which takes as its arguments a function and a list, and returns the list formed by applying the function to each member of the list. For a language to support map, it must support passing a function as an argument.
There are certain implementation difficulties in passing functions as arguments and returning them as results, especially in the presence of non-local variables introduced in nested and anonymous functions. Historically, these were termed the funarg problems, the name coming from "function argument".[5] In early imperative languages these problems were avoided by either not supporting functions as result types (Algol, Pascal) or omitting nested functions and thus non-local variables (C). The early functional language Lisp took the approach of dynamic scoping, where non-local variables refer to the closest definition of that variable at the point where the function is executed, instead of where it was defined. Proper support for lexically-scoped first-class function was introduced in Scheme and requires handling references to functions as closures[4] instead of bare function pointers, which in turn makes garbage collection a necessity.
Contents |
In this section we compare how some concepts are handled in a functional language with first-class functions (Haskell) compared to an imperative language where functions are second-class citizens (C).
In languages where functions are first-class citizens, functions can be passed as arguments to other functions in the same way other types can (a function taking another function as argument is called a higher-order function). In the Haskell programming language:
map :: (a -> b) -> [a] -> [b] map f [] = [] map f (x:xs) = f x : map f xs
Languages where functions are not first-class often still allow one to write higher-order functions through the use of concepts such as function pointers or delegates. In the C programming language:
void map(int (*f)(int), int x[], size_t n) { for (int i = 0; i < n; i++) x[i] = (*f)(x[i]); }
When comparing the two samples one should note that there are a number of differences between the two approaches that are not directly related to the support of first-class functions. The Haskell sample operates on lists, while the C sample operates on arrays. Both are the most natural compound data structures in the respective languages and making the C sample operate on linked lists would have made it unnecessarily complex. This also accounts for the fact that the C functions needs an additional parameter (giving the size of the array.) The C function updates the array in-place, returning no value, whereas in Haskell data structures are persistent (a new list is returned while the old is left intact.) The Haskell sample uses recursion to traverse the list, while the C sample uses iteration. Again, this is the most natural way to express this function in both languages, but the Haskell sample could easily have been expressed in terms of a fold and the C sample in terms of recursion. Finally, the Haskell function has a polymorphic type, as this is not supported by C we have fixed all type variables to the type constant int
.
In language supporting anonymous functions we can pass such a function as an argument to a higher-order function:
main = map (\x -> 3 * x + 1) [1, 2, 3, 4, 5]
In a language which does not support anonymous functions, we have to bind it to a name instead:
int f(int x) { return 3 * x + 1; } int main() { int l[] = {1, 2, 3, 4, 5}; map(f, l, 5); }
Once we have anonymous or nested functions it becomes natural for them to refer to variables outside of their body (called non-local variables):
main = let a = 3 b = 1 in map (\x -> a * x + b) [1, 2, 3, 4, 5]
If functions are represented as bare function pointers it is no longer obvious how we should pass the value outside of the function body to it. We instead have to manually build a closure and one can at this point no longer speak of "first-class" functions.
typedef struct { int (*f)(int, int, int); int *a; int *b; } closure_t; void map(closure_t *closure, int x[], size_t n) { for (int i = 0; i < n; ++i) x[i] = (*closure->f)(*closure->a, *closure->b, x[i]); } int f(int a, int b, int x) { return a * x + b; } void main() { int l[] = {1, 2, 3, 4, 5}; int a = 3; int b = 1; closure_t closure = {f, &a, &b}; map(&closure, l, 5); }
Also note that the map
is now specialized to functions referring to two int
s outside of their environment. This can be set up more generally, but requires more boilerplate code. If f
would have been a nested function we would still have run into the same problem and this is the reason they are not supported in C.[6]
When returning a function we are in fact returning its closure. In the C example any local variables captured by the closure will go out of scope once we return from the function that builds the closure. Forcing the closure at a later point will result in undefined behaviour, possibly corrupting the stack. This is known as the upwards funarg problem.
Assigning functions to variables and storing them inside (global) datastructures potentially suffers from the same difficulties as returning functions.
f :: [[Integer] -> [Integer]] f = let a = 3 b = 1 in [map (\x -> a * x + b), map (\x -> b * x + a)]
As one can test most literals and values for equality, it is natural to ask of a programming language to support testing functions for equality. On further inspection this question appears more difficult and one has to distinguish between several types of function equality:[7]
In type theory, the type of functions accepting values of type A and returning values of type B may be written as A → B or BA. In the Curry-Howard correspondence, function types are related to logical implication; lambda abstraction corresponds to discharging hypothetical assumptions and function application corresponds to the modus ponens inference rule. Besides the usual case of programming functions, type theory also uses first-class functions to model associative arrays and similar data structures.
In category-theoretical accounts of programming, the availability of first-class functions corresponds to the closed category assumption. For instance, the simply-typed lambda calculus corresponds to the internal language of cartesian closed categories.
Functional programming languages, such as Scheme, ML, Haskell, F#, and Scala all have first-class functions. When Lisp, one of the earliest functional languages, was designed not all aspects of first-class function were properly understood yet, resulting in functions being dynamically scoped. The later Common Lisp dialect, does have lexically-scoped first-class functions.
Many recent interpreted and scripting languages, including Perl, Python, PHP, Lua, Tcl/Tk, JavaScript, Ruby, and Io, have first-class functions.
For the imperative languages a distinction has to be made between the traditional Algol family, the traditional C family, and the modern garbage-collected variants. The Algol family has allowed nested functions and higher-order taking function as arguments, but not higher-order functions that return functions as results. The reason for this was that it was not known how to deal with non-local variables if a nested-function was returned as a result.
The C family allowed both passing functions as arguments and returning them as results, but avoided any problems not supporting nested functions. As the usefulness of returning functions primarily lies in the ability to return nested functions that have captured non-local variables, instead of top-level functions, these language are generally not considered to have first-class functions.
Modern imperative languages often support garbage-collection making the implementation of first-class functions feasible. First-class function have often only been supported in later revisions of the language, including C# 2.0 and Apple's Blocks extension to C, C++ and Objective C. C++11 has added support for anonymous functions and closures to the language, but the non-garbage collected nature of the language and capture semantics of non-local variables still make it problematic for functions to be returned as results.
Language | Higher-order functions | Non-local variables | Partial application | Notes | ||||
---|---|---|---|---|---|---|---|---|
Arguments | Results | Nested functions | Anonymous functions | Closures | ||||
Algol family | ALGOL | Yes | No | Yes | No | No | No | Have function types. |
Pascal | Yes | No | Yes | No | No | No | ||
Oberon | Yes | Non-nested only | Yes | No | No | No | ||
C family | C | Yes | Yes | No | No | No | No | Has function pointers. |
C++ | Yes | Yes | No | C++0x[8] | C++0x[8] | No | Has function pointers, function objects. (Also, see below.) | |
C# | Yes | Yes | No | 2.0 | 2.0 | No | Has delegates. | |
Objective-C | Yes | Yes | No | 2.0 + Blocks[9] | 2.0 + Blocks | No | Has function pointers. | |
Java | No | No | No | No | No | No | Has anonymous inner classes. | |
Functional languages | Lisp | Syntax | Syntax | Yes | Yes | Common Lisp | No | (see below) |
Scheme | Yes | Yes | Yes | Yes | Yes | SRFI 26[10] | ||
Clojure | Yes | Yes | Yes | Yes | Yes | Yes | ||
ML | Yes | Yes | Yes | Yes | Yes | Yes | ||
Haskell | Yes | Yes | Yes | Yes | Yes | Yes | ||
Scala | Yes | Yes | Yes | Yes | Yes | Yes | ||
Scripting languages | JavaScript | Yes | Yes | Yes | Yes | Yes | No | |
PHP | Yes | Yes | Unscoped | 5.3 | 5.3 | No | (see below) | |
Perl | Yes | Yes | 6 | Yes | Yes | 6[11] | ||
Python | Yes | Yes | Yes | Partial | Yes | 2.5[12] | (see below) | |
Ruby | Syntax | Syntax | Unscoped | Yes | Yes | 1.9 | (see below) | |
Other languages | Mathematica | Yes | Yes | Yes | Yes | Yes | No |
(function )
or #'
operators. Conversely, to apply a function encapsulated in such a way as a value, one must use the (funcall )
operator instead of calling it directly.create_function
but could not refer to variable in the outer scope.functools.partial
since version 2.5, and operator.methodcaller
since version 2.6.Method
or Proc
object to be used as first-class data. The syntax for calling such a function object differs from calling regular methods.Proc#curry
.
|